fix: preserve the order of the keys #211

zookzook · 2023-09-18T09:05:44Z

No description provided.

rhcarvalho · 2023-09-18T16:43:12Z

@zookzook do you prefer to discuss here or back at #211?

I'll provide here some initial feedback related to the present implementation.

First off, thanks for taking the time to implement this extra functionality ❤️

Setup

Here's how I set this up on a Livebook:

Mix.install(
  [
    {:csv, "~> 3.0"},
    {:exqlite, "~> 0.13.15"},
    {:kino_db, "~> 0.2.2"},
    {:kino_explorer, "~> 0.1.10"},
    {:kino,
     github: "livebook-dev/kino", ref: "06625859ef367c50169a280ae5fe33cf9a49a8f3", override: true},
    {:mongodb_driver, github: "zookzook/elixir-mongodb-driver", ref: "preserve-order"},
    {:tzdata, "~> 1.1"}
  ],
  config: [
    elixir: [time_zone_database: Tzdata.TimeZoneDatabase],
    mongodb_driver: [decoder: BSON.PreserverOrderDecoder]
  ]
)

Query with explicit projection

And an example query:

Mongo.find(mongo, "Message", %{},
sort: %{_id: -1}, 
limit: 10, 
projection: [_id: 0, from: "$user.name", at: "$created_at", message: "$data"])
# |> Enum.to_list()
|> Enum.map(fn doc -> 
  doc
  |> Map.get(:order)
  |> Enum.map(fn key -> {String.to_atom(key), Map.get(doc, key)} end)
end)

Since I pass a projection using a keyword list, I expect and get back the :order list in the requested order, and all documents have the very same order. Worked great 👍

I know generating atoms dynamically has its perils, but in this case it seems under control and does improve readability of the output: [[foo: "bar"], ...] instead of [[{"foo", "bar"}], ...]

Query without a projection

Next query, without a projection:

Mongo.find(mongo, "Message", %{},
sort: %{_id: -1}, 
limit: 10)
# |> Enum.to_list()
|> Enum.map(fn doc -> 
  doc
  |> Map.get(:order)
  |> Enum.map(fn key -> {String.to_atom(key), Map.get(doc, key)} end)
end)

The query above represents better what I was originally looking for: exploring a collection and getting the fields in the order that MongoDB has them stored.

This query reviewed an obvious shortcoming of my code to transform the result: it should be applied recursively to handle subdocuments.

This got me thinking maybe I could write a custom decoder that handles recursion while parsing the response from Mongo, and such that I don't need to post-process query results. @zookzook what do you think?

zookzook · 2023-09-19T06:40:05Z

This got me thinking maybe I could write a custom decoder that handles recursion while parsing the response from Mongo, and such that I don't need to post-process query results. @zookzook what do you think?

Keep in mind, that the result of the decoder is used by the driver. If you would change the result by converting the binaries into atoms the whole driver will not work anymore.

Try to use the collection or write just a little recursive function that converts the data into the desired format.

rhcarvalho · 2023-09-19T08:09:57Z

This got me thinking maybe I could write a custom decoder that handles recursion while parsing the response from Mongo, and such that I don't need to post-process query results. @zookzook what do you think?

Keep in mind, that the result of the decoder is used by the driver. If you would change the result by converting the binaries into atoms the whole driver will not work anymore.

I'm not sure I fully understand the implications. Could you give me a code example (perhaps a link to library source on GitHub) where this is happening?

Try to use the collection or write just a little recursive function that converts the data into the desired format.

I read https://hexdocs.pm/mongodb_driver/1.1.0/Mongo.Collection.html, and understood that to use Mongo.Collection effectively I'd need to declare all fields of my documents upfront, for every collection, defeating the idea of reading and printing the documents from Mongo the way (order of fields) they are.

I'll try the recursive function.

zookzook · 2023-09-19T08:17:25Z

I'm not sure I fully understand the implications. Could you give me a code example (perhaps a link to library source on GitHub) where this is happening?

Check this module:
https://github.com/zookzook/elixir-mongodb-driver/blob/master/lib/mongo/stream.ex

rhcarvalho · 2023-09-19T09:25:52Z

I'm not sure I fully understand the implications. Could you give me a code example (perhaps a link to library source on GitHub) where this is happening?

Check this module: https://github.com/zookzook/elixir-mongodb-driver/blob/master/lib/mongo/stream.ex

Is it that exec_command_session returns a "document" that gets decoded the same way that a document within a cursor gets decoded?

From the Mongo.Stream perspective, I couldn't find where the cursor docs are expected to be maps 😕
Looks like docs is an "untouched" list of anything.

Then looking at

elixir-mongodb-driver/lib/mongo.ex

Lines 1504 to 1510 in 9e4d569

    
           def exec_command_session(session, cmd, opts) do 
        
             with {:ok, conn, new_cmd} <- Session.bind_session(session, cmd), 
        
                  {:ok, _cmd, response} <- DBConnection.execute(conn, %Query{action: {:command, new_cmd}}, [], opts), 
        
                  :ok <- Session.update_session(session, response, opts), 
        
                  {:ok, {_flags, doc}} <- check_for_error(response, cmd, opts) do 
        
               {:ok, doc} 
        
             else

I could guess that DBConnection.execute is the thing that eventually returns "doc", but couldn't find where BSON.Decoder is used in the flow going from a query to MongoDB.... I need to fill in some knowledge gaps :)

(...)

I did find https://github.com/zookzook/elixir-mongodb-driver/blob/preserve-order/lib/mongo/messages.ex, so the "messages from Mongo server" are treated the same as the documents within a query cursor, is that right?

So if the result of running a command would not be represented as a map, then the library would error out trying to access map keys or doing pattern matching.

rhcarvalho · 2023-09-19T09:30:44Z

In the meantime, I wrote the recursive function:

defmodule Mongo.Document do
  def to_list(doc, order_key \\ :order) when is_map(doc) and not is_struct(doc) do
    doc
    |> Map.pop(order_key, [])
    |> then(fn {order, doc} ->
      doc
      |> Map.to_list()
      |> Enum.sort_by(fn {key, _} -> Enum.find_index(order, &(&1 == key)) end)
    end)
    |> Enum.map(fn
      {key, value} when is_map(value) and not is_struct(value) -> {key, to_list(value, order_key)}
      {key, value} -> {key, value}
    end)
  end
end

That could help me (or someone else in the future) go from the proposed map + :order key to an ordered collection:

Mongo.find(mongo, "Message", %{},
  sort: %{_id: -1},
  limit: 10
)
|> Enum.map(&Mongo.Document.to_list/1)

zookzook · 2023-09-19T09:32:33Z

From the Mongo.Stream perspective, I couldn't find where the cursor docs are expected to be maps 😕
Looks like docs is an "untouched" list of anything.

No, there are a lot of pattern matches:

 with {:ok, %{"cursorsAlive" => [], "cursorsNotFound" => [], "cursorsUnknown" => [], "ok" => ok}} when ok == 1 <- Mongo.exec_command_session(session, cmd, stream.opts) do
        :ok
      end

I did find https://github.com/zookzook/elixir-mongodb-driver/blob/preserve-order/lib/mongo/messages.ex, so the "messages from Mongo server" are treated the same as the documents within a query cursor, is that right?

Yes, the response is like a wrapper document that contains the "real" documents.

So if the result of running a command would not be represented as a map, then the library would error out trying to access map keys or doing pattern matching.

Yes, that's right!

zookzook · 2023-09-19T10:02:06Z

What about:

  def to_list(doc, order_key \\ :order) do
    doc
    |> Map.get(order_key, [])
    |> Enum.map(fn key -> {key, Map.get(doc, key)} end)
    |> Enum.map(fn {key, value} -> {key, eval(value, order_key)} end)
  end

  defp eval(%{__struct__: _other} = elem, _order_key) do
    elem
  end

  defp eval(doc, order_key) when is_map(doc) do
    to_list(doc, order_key)
  end

  defp eval(xs, order_key) when is_list(xs) do
    Enum.map(xs, fn elem -> eval(elem, order_key) end)
  end

  defp eval(elem, _order_key) do
    elem
  end

the find-index-function may result in a bottleneck.

rhcarvalho

Hey Michael, thanks again for this PR ❤️

Thanks to you I understand a bit more of the underlying implementation of the library and can achieve what I wanted for exploring Mongo collections 💯

I wanted to leave a few suggestions for the documentation if this ends up in a future release.

rhcarvalho · 2023-09-19T09:53:15Z

samples/decoder.exs

I'm guessing this as an experiment to compare benchmarks with old vs new decoder 😄

How is this file meant to be used? Should it be in the existing ./examples directory instead of a new one?

The doc is represented as a map, so encoding it doesn't preserve any well-defined order. Same for the documents within items -- IIUC the fields :id, :age and :name could end up serialized in any order and the order could even be different for different items.

Yes, that's right. I need to remove it:

Original encoder 3 9.54 K 104.79 μs ±6.53% 103.79 μs 130.19 μs Original encoder 2 9.53 K 104.93 μs ±6.48% 104.13 μs 129.92 μs Original encoder 9.50 K 105.24 μs ±7.51% 103.63 μs 138.92 μs Original encoder 1 9.26 K 107.97 μs ±19.33% 105.75 μs 150.14 μs Original encoder 3 15.00 K 66.66 μs ±7.15% 66.13 μs 76.91 μs Original encoder 2 14.99 K 66.72 μs ±8.32% 66.21 μs 76.29 μs Original encoder 1 14.97 K 66.82 μs ±7.40% 66.25 μs 76.78 μs Original encoder 14.96 K 66.85 μs ±7.80% 66.33 μs 76.67 μs

The new encode is faster than the old one.

lib/mongo/messages.ex

README.md

rhcarvalho · 2023-09-19T10:02:57Z

README.md

+end
+
+defmodule BSON.PreserverOrderDecoder do
+  @moduledoc false


Would you be willing to post the relevant docs to this module in Hexdocs.pm? I'd volunteer to help write it, since getting docs either on hexdocs or IDE/language server would help discover this feature and decide whether one needs it or not.

Yes, of course.

README.md

rhcarvalho · 2023-09-19T10:05:18Z

README.md

+  decoder: MyPreserverOrderDecoder
+
+```
+The decode module is defined at compiler time. The driver provides two types of decoder:


Suggested change

The decode module is defined at compiler time. The driver provides two types of decoder:

The decoder module is defined at compile time. The driver provides two types of decoder:

rhcarvalho · 2023-09-19T10:26:38Z

What about: (...) the find-index-function may result in a bottleneck.

Nice! I didn't consider performance. For my current needs it doesn't really matter. I'm running one-off queries on Livebook :)

Your version reads much better than mine.

Neither of us wrote tests, though 😊! I did manual-test with a few queries -- and while handling the ObjectId struct case I realized that depending on configuration, the :order key may or may not be present.

I think your to_list would return an empty list when the order_key is missing? One extra branching there to handle the case of a missing order_key might do it.

Co-authored-by: Rodolfo Carvalho <[email protected]>

zookzook · 2023-09-19T10:53:05Z

Perhaps we should rename the default name :order to :__order__ to protect it from collisions.

rhcarvalho · 2023-09-22T09:33:06Z

Hey, I still have this in mind, just bandwidth has been limited :)

Mapping out our next steps here:

rename the default name :order to :__order__ to protect it from collisions
~~public~~ documentation for decoders @rhcarvalho (update: the modules have only internal documentation, their existence if however documented in the README)
provide in documentation an example function mapping "maps with order field" to ordered lists (based on what we've already wrote in previous comments) @rhcarvalho
remove or document benchmarks (?)
anything else I missed? Feel free to edit.

By the way, today I learned about String.to_existing_atom through https://erlef.github.io/security-wg/web_app_security_best_practices_beam/common_web_application_vulnerabilities#atom-exhaustion, so that could be used in any example we may end up writing.

rhcarvalho · 2023-10-19T09:31:38Z

I've solved the merge conflict in a local branch and I'm writing docs.

Wanted an opinion regarding configuring an alternative key instead of :__order__.

I thought exposing BSON.DecoderGenerator in docs might be premature/unnecessary, and we'd have more flexibility changing or removing it in future versions if it was kept as an implementation detail for now. I think less concepts also makes the driver easier to use.

What do you think of

Use default :__order__

config :mongodb_driver,
  decoder: BSON.OrderPreservingDecoder

Set custom key

config :mongodb_driver,
  decoder: {BSON.OrderPreservingDecoder, key: :original_order}

I also briefly considered not introducing BSON.OrderPreservingDecoder at all, but just some driver-level config or config to the default decoder, but didn't quite like it after all... choosing good names is always hard :)

rhcarvalho · 2023-10-19T09:34:32Z

Ah, worth contrasting the above suggestion with what we have currently:

If you want to change the :order key then define a new decoder module:
defmodule MyPreserverOrderDecoder do
  @moduledoc false

  use BSON.DecoderGenerator, preserve_order: :the_key_order
end

(and on top of creating a module, one would still need to update the config too!)

rhcarvalho · 2023-10-19T15:37:28Z

(I was reading a blog post looking for inspiration...)

A disadvantage of setting the decoder globally is that we cannot choose to open two connections with different behaviors, they are forced to use a single decoder implementation.

In tests we can still test each decoder module independently, that's okay -- I wrote some basic tests for each of the implementations we have, with and without order.

@zookzook what if we make the decoder configurable per connection? Apparently it would require wiring things together in Mongo, Mongo.Messages and Mongo.MongoDBConnection.

* fix: handle read preferences without tags * Fixed minor bug parsing url connection. Updated dependencies libraries. (#209) * Fixed minor bug parsing url connection. Updated dependencies libraries. * Incremented max_nesting option in credo refactor nestring * Changed function ReadPreference.primary() by ReadPreference.merge_defaults() in topology test file --------- Co-authored-by: Juan Antonio Jiménez <[email protected]> Co-authored-by: Michael Maier <[email protected]> * fix: applies the global timeout value to each query (#215) * Document BSON.Decoder and BSON.PreserveOrderDecoder But keep the documentation for those modules private for now. Expand the documentation on Data Representation and how to deal with preserving key order when encoding and decoding data. * Test BSON decoders * Document how to go from annotated maps with order keys to lists --------- Co-authored-by: zookzook <[email protected]> Co-authored-by: Juan Antonio <[email protected]> Co-authored-by: Juan Antonio Jiménez <[email protected]>

zookzook self-assigned this Sep 18, 2023

fix: preserve the order of the keys

c87a078

zookzook force-pushed the preserve-order branch from a6a5a8f to c87a078 Compare September 18, 2023 09:20

rhcarvalho reviewed Sep 19, 2023

View reviewed changes

zookzook and others added 6 commits September 19, 2023 12:48

Update lib/mongo/messages.ex

34ad292

Co-authored-by: Rodolfo Carvalho <[email protected]>

Update lib/mongo/messages.ex

901c13f

Co-authored-by: Rodolfo Carvalho <[email protected]>

Update lib/mongo/messages.ex

9e6791d

Co-authored-by: Rodolfo Carvalho <[email protected]>

Update README.md

dced77f

Co-authored-by: Rodolfo Carvalho <[email protected]>

Update lib/mongo/messages.ex

a1aec86

Co-authored-by: Rodolfo Carvalho <[email protected]>

Update README.md

829dbe1

Co-authored-by: Rodolfo Carvalho <[email protected]>

zookzook added 2 commits September 22, 2023 14:33

removed the benchmark code

0ceb046

renamed the predefined order key

ecd8da0

rhcarvalho mentioned this pull request Oct 19, 2023

Preserve order #220

Merged

zookzook merged commit b360d87 into master Nov 26, 2023
3 checks passed

zookzook deleted the preserve-order branch November 26, 2023 12:12

github-actions bot mentioned this pull request Nov 26, 2023

chore(master): release 1.2.1 #210

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve the order of the keys #211

fix: preserve the order of the keys #211

zookzook commented Sep 18, 2023

rhcarvalho commented Sep 18, 2023

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho left a comment

rhcarvalho Sep 19, 2023

zookzook Sep 19, 2023

rhcarvalho Sep 19, 2023

zookzook Sep 19, 2023

rhcarvalho Sep 19, 2023

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 22, 2023 •

edited

Loading

rhcarvalho commented Oct 19, 2023

rhcarvalho commented Oct 19, 2023

rhcarvalho commented Oct 19, 2023

	The decode module is defined at compiler time. The driver provides two types of decoder:
	The decoder module is defined at compile time. The driver provides two types of decoder:

fix: preserve the order of the keys #211

fix: preserve the order of the keys #211

Conversation

zookzook commented Sep 18, 2023

rhcarvalho commented Sep 18, 2023

Setup

Query with explicit projection

Query without a projection

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho left a comment

Choose a reason for hiding this comment

rhcarvalho Sep 19, 2023

Choose a reason for hiding this comment

zookzook Sep 19, 2023

Choose a reason for hiding this comment

rhcarvalho Sep 19, 2023

Choose a reason for hiding this comment

zookzook Sep 19, 2023

Choose a reason for hiding this comment

rhcarvalho Sep 19, 2023

Choose a reason for hiding this comment

rhcarvalho commented Sep 19, 2023

zookzook commented Sep 19, 2023

rhcarvalho commented Sep 22, 2023 • edited Loading

rhcarvalho commented Oct 19, 2023

rhcarvalho commented Oct 19, 2023

rhcarvalho commented Oct 19, 2023

rhcarvalho commented Sep 22, 2023 •

edited

Loading